Skip to main content

Web Collection 2012

Web crawl data from the year 2012. Some of this data is currently not publicly accessible.

192,544
RESULTS
rss


PART OF
Web Collections
Media Type
192,544
web
Topics & Subjects
152,946
crawldata
3,503
wiki
3,495
dumps
3,494
incremental
3,494
media
3,494
tape
More right-solid
Collection
192,123
Web Crawls
134,403
Internet Archive Web Crawls
95,684
Worldwide Web Crawls
39,252
Wide Crawl started April 2012
32,297
Youtube Videos
30,362
Wide Crawl started January 2012
More right-solid
Creator
140,978
internet archive
21,532
archive-it
11,966
thumper2.php
3,513
wikimedia projects editors
273
portuguese web archive
17
www.engadget.com
More right-solid
Language
3,037
English
275
Portuguese
26
Italian
19
Anyhub
13
Swedish
9
German
More right-solid
SHOW DETAILS
up-solid down-solid
eye
Title
Date Archived
Creator
Wide Crawl started January 2012
web
eye 4.2M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl423.us.archive.org:wide from Tue Jan 17 08:02:53 PST 2012 to Tue Jan 17 01:16:20 PST 2012.
Topic: crawldata
vkontakte.ru
web
eye 2M
favorite 0
comment 0
Source: vkontakte.ru
Wide Crawl started January 2012
web
eye 1.6M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Sat Jan 21 04:01:50 PST 2012 to Fri Jan 20 21:01:34 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 960,759
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-gen1.us.archive.org from 2012-07-20T21:59:15 UTC to 2012-07-21T06:44:52 UTC.
Topic: crawldata
Youtube Videos
web
eye 874,997
favorite 0
comment 0
Internet Archive crawldata from YouTube Video archiving project 2011, captured by crawl440.us.archive.org:youtube from Sat Jul 21 05:39:04 PDT 2012 to Fri Jul 20 23:46:43 PDT 2012.
Topic: crawldata
recurrence=WEEKLY, maxDuration=259200, maxDocumentCount=null, isTestCrawl=false, seedCount=3, accountId=575, organizationName="Chicago-Kent College of Law", collectionId=2817, collectionName="Chicago-Kent College of Law"
recurrence=NONE, maxDuration=259200, maxDocumentCount=null, isTestCrawl=false, seedCount=12, accountId=593, organizationName="Government Printing Office", collectionId=3142, collectionName="CFPB"
Japan Earthquake
web
eye 832,043
favorite 0
comment 0
recurrence=NONE, maxDuration=259200, maxDocumentCount=null, isTestCrawl=false, seedCount=507, accountId=156, organizationName="Virginia Tech: Crisis, Tragedy, and Recovery Network", collectionId=2438, collectionName="Japan Earthquake"
Live Web Proxy Crawls
web
eye 802,286
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Tue Jan 3 12:37:06 PST 2012 to Tue Jan 3 06:31:55 PST 2012.
Topic: crawldata
The Archive Team Just In Time Grabs
web
eye 781,546
favorite 0
comment 0
Election Crawl 2012
web
eye 606,181
favorite 0
comment 0
Internet Archive crawldata uploaded by crawling119.us.archive.org:COL-ELECTION2012 from Fri Jun 29 21:24:09 PDT 2012 to Thu Jan 10 20:37:50 PST 2013.
Topic: crawldata
Wikipedia Outlinks February 2012
web
eye 599,740
favorite 0
comment 0
Internet Archive crawldata from wikipedia outbound links. captured by crawl435.us.archive.org:wpo from Thu Mar 1 20:56:37 PST 2012 to Thu Mar 1 14:19:48 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 539,707
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Sat May 12 21:38:20 PDT 2012 to Sat May 12 18:43:53 PDT 2012.
Topic: crawldata
Wide Crawl started September 2012
web
eye 528,125
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl337.us.archive.org:wide from Wed Oct 17 08:14:47 PDT 2012 to Wed Oct 17 02:41:59 PDT 2012.
Topic: crawldata
Alexa Crawls
by thumper2.php
web
eye 479,299
favorite 0
comment 0
Alexa crawl
Topic: crawldata
Wikipedia Outlinks February 2012
web
eye 453,988
favorite 0
comment 0
Internet Archive crawldata from wikipedia outbound links. captured by crawl435.us.archive.org:wpo from Thu Oct 18 19:09:44 PDT 2012 to Thu Oct 18 13:30:19 PDT 2012.
Topic: crawldata
Wide Crawl started January 2012
web
eye 437,615
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl420.us.archive.org:wide from Wed Jan 11 15:00:58 PST 2012 to Wed Jan 11 07:47:03 PST 2012.
Topic: crawldata
Wide Crawl started September 2012
web
eye 433,966
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl339.us.archive.org:wide from Fri Oct 19 01:17:49 PDT 2012 to Thu Oct 18 21:38:24 PDT 2012.
Topic: crawldata
Wide Crawl started January 2012
web
eye 430,438
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Thu Jan 5 18:18:33 PST 2012 to Thu Jan 5 11:30:47 PST 2012.
Topic: crawldata
Wide Crawl started January 2012
web
eye 417,803
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Thu Jan 5 20:28:12 PST 2012 to Thu Jan 5 13:25:29 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 408,084
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Wed Mar 7 23:01:46 PST 2012 to Wed Mar 7 17:53:26 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 395,860
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Tue Jan 3 11:14:32 PST 2012 to Tue Jan 3 04:36:49 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 393,453
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Tue Jan 3 14:35:17 PST 2012 to Tue Jan 3 08:02:37 PST 2012.
Topic: crawldata
Archive Team
web
eye 386,471
favorite 0
comment 0
This item contains regular captures of Dutch news websites in screenshot and WARC format. Dit item bevat de homepages van Nederlandse nieuwswebsites als screenshot en in WARC-formaat. Websites: nos.nl teletekst.nos.nl rtlnieuws.nl nu.nl telegraaf.nl metronieuws.nl spitsnieuws.nl volkskrant.nl nrc.nl trouw.nl parool.nl fd.nl refdag.nl
Live Web Proxy Crawls
web
eye 382,837
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-11-22T06:44:42 UTC to 2012-11-22T18:14:08 UTC.
Topic: crawldata
Wide Crawl started January 2012
web
eye 382,223
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Thu Jan 5 16:52:35 PST 2012 to Thu Jan 5 10:17:00 PST 2012.
Topic: crawldata
Wide Crawl started January 2012
web
eye 366,665
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl427.us.archive.org:wide from Tue Apr 17 00:58:46 PDT 2012 to Mon Apr 16 20:37:31 PDT 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 366,304
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-13T10:53:21 UTC to 2012-12-13T16:53:32 UTC.
Topic: crawldata
Wide Crawl started September 2012
web
eye 359,579
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl410.us.archive.org:wide from Wed Nov 7 02:28:32 PST 2012 to Tue Nov 6 20:50:40 PST 2012.
Topic: crawldata
Archive Team
web
eye 356,941
favorite 0
comment 0
This item contains regular captures of Dutch news websites in screenshot and WARC format. Dit item bevat de homepages van Nederlandse nieuwswebsites als screenshot en in WARC-formaat. Websites: nos.nl teletekst.nos.nl rtlnieuws.nl nu.nl telegraaf.nl metronieuws.nl spitsnieuws.nl volkskrant.nl nrc.nl trouw.nl parool.nl fd.nl refdag.nl
NLS_2012
web
eye 349,541
favorite 0
comment 0
Internet Archive crawldata uploaded by selenium-101.us.archive.org:NLS-CRAWL-004 from Sat Jun 16 16:29:25 PDT 2012 to Wed Dec 12 11:44:11 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 348,448
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-gen1.us.archive.org from 2012-06-26T02:05:01 UTC to 2012-06-26T10:40:25 UTC.
Topic: crawldata
Wide Crawl started January 2012
web
eye 345,799
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Thu Jan 5 15:53:45 PST 2012 to Thu Jan 5 08:51:54 PST 2012.
Topic: crawldata
google.co.jp
web
eye 343,877
favorite 0
comment 0
Source: google.co.jp
Live Web Proxy Crawls
web
eye 343,605
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-31T05:25:29 UTC to 2013-01-01T02:09:59 UTC.
Topic: crawldata
NLS_2012
web
eye 338,474
favorite 0
comment 0
Internet Archive crawldata uploaded by selenium-101.us.archive.org:NLS-CRAWL-004 from Wed Oct 31 04:28:48 PDT 2012 to Wed Dec 12 01:48:33 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 333,711
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live0.us.archive.org from 2012-12-14T21:29:12 UTC to 2013-01-23T21:42:32 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 331,835
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live0.us.archive.org from 2012-12-30T15:07:01 UTC to 2012-12-31T14:15:26 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 314,369
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Mon Mar 5 18:11:24 PST 2012 to Mon Mar 5 12:34:16 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 308,049
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live0.us.archive.org from 2012-12-18T16:45:10 UTC to 2012-12-19T09:20:10 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 305,220
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-14T17:34:12 UTC to 2012-12-15T10:27:19 UTC.
Topic: crawldata
Wide Crawl started April 2012
web
eye 302,614
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl422.us.archive.org:wide from Fri Jun 8 07:28:07 PDT 2012 to Fri Jun 8 02:31:40 PDT 2012.
Topic: crawldata
Archive Team
web
eye 300,835
favorite 0
comment 0
This item contains regular captures of Dutch news websites in screenshot and WARC format. Dit item bevat de homepages van Nederlandse nieuwswebsites als screenshot en in WARC-formaat. Websites: nos.nl teletekst.nos.nl rtlnieuws.nl nu.nl telegraaf.nl metronieuws.nl spitsnieuws.nl volkskrant.nl nrc.nl trouw.nl parool.nl fd.nl refdag.nl
The Archive Team Just In Time Grabs
web
eye 297,382
favorite 1
comment 0
Live Web Proxy Crawls
web
eye 293,228
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live0.us.archive.org from 2012-11-22T04:31:46 UTC to 2012-11-22T19:27:11 UTC.
Topic: crawldata
Wide Crawl started January 2012
web
eye 291,026
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Thu Jan 5 19:32:22 PST 2012 to Thu Jan 5 12:20:54 PST 2012.
Topic: crawldata
Wide Crawl started January 2012
web
eye 289,428
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl336.us.archive.org:wide from Wed Jan 18 17:01:06 PST 2012 to Wed Jan 18 09:17:47 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 287,443
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Tue Feb 28 04:16:59 PST 2012 to Tue Feb 28 03:44:00 PST 2012.
Topic: crawldata
Wide Crawl started April 2012
web
eye 287,337
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Sun May 13 07:09:39 PDT 2012 to Sun May 13 01:49:20 PDT 2012.
Topic: crawldata
Archive Team
web
eye 285,985
favorite 0
comment 0
This item contains regular captures of Dutch news websites in screenshot and WARC format. Dit item bevat de homepages van Nederlandse nieuwswebsites als screenshot en in WARC-formaat. Websites: nos.nl teletekst.nos.nl rtlnieuws.nl nu.nl telegraaf.nl metronieuws.nl spitsnieuws.nl volkskrant.nl nrc.nl trouw.nl parool.nl fd.nl refdag.nl
Wikipedia Outlinks February 2012
web
eye 283,797
favorite 0
comment 0
Internet Archive crawldata from wikipedia outbound links. captured by crawl435.us.archive.org:wpo from Sat Feb 11 14:58:31 PST 2012 to Sat Feb 11 08:33:22 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 283,671
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-30T19:10:53 UTC to 2012-12-31T10:26:34 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 282,615
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Sun Jan 1 09:53:25 PST 2012 to Sun Jan 1 03:42:17 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 281,798
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-gen1.us.archive.org from 2012-07-30T04:16:42 UTC to 2012-07-30T11:28:26 UTC.
Topic: crawldata
ask.com
web
eye 281,764
favorite 0
comment 0
Source: ask.com
Live Web Proxy Crawls
web
eye 280,682
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Sun Feb 12 06:50:14 PST 2012 to Sun Feb 12 02:22:07 PST 2012.
Topic: crawldata
reddit.com
web
eye 279,819
favorite 0
comment 0
Source: reddit.com
reddit.com
web
eye 278,135
favorite 0
comment 0
Source: reddit.com
Wide Crawl started April 2012
web
eye 277,355
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Thu May 3 12:36:06 PDT 2012 to Thu May 3 07:30:25 PDT 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 277,202
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-27T00:05:42 UTC to 2012-12-27T16:35:17 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 273,197
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-10-30T19:58:33 UTC to 2012-10-31T18:47:34 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 272,431
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-28T19:46:00 UTC to 2012-12-29T15:05:14 UTC.
Topic: crawldata
Archive Team
web
eye 271,236
favorite 0
comment 0
This item contains regular captures of Dutch news websites in screenshot and WARC format. Dit item bevat de homepages van Nederlandse nieuwswebsites als screenshot en in WARC-formaat. Websites: nos.nl teletekst.nos.nl rtlnieuws.nl nu.nl telegraaf.nl metronieuws.nl spitsnieuws.nl volkskrant.nl nrc.nl trouw.nl parool.nl fd.nl refdag.nl
ameblo.jp
web
eye 271,144
favorite 0
comment 0
Source: ameblo.jp
Live Web Proxy Crawls
web
eye 270,225
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-07T06:06:22 UTC to 2012-12-10T21:54:23 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 270,104
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live0.us.archive.org from 2012-10-15T10:26:27 UTC to 2012-10-15T18:41:53 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 269,830
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-gen1.us.archive.org from 2012-07-27T10:56:04 UTC to 2012-07-27T17:15:08 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 266,743
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-20T18:25:28 UTC to 2012-12-21T16:41:39 UTC.
Topic: crawldata
NLS_2012
web
eye 265,044
favorite 0
comment 0
Internet Archive crawldata uploaded by selenium-101.us.archive.org:NLS-CRAWL-004 from Sun Jul 8 09:00:15 PDT 2012 to Wed Dec 12 01:48:28 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 263,922
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Sat Mar 24 09:36:36 PDT 2012 to Sat Mar 24 08:42:19 PDT 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 263,314
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live0.us.archive.org from 2012-12-14T22:44:21 UTC to 2012-12-15T05:16:24 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 261,110
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-27T07:49:06 UTC to 2012-12-28T07:29:11 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 260,298
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live0.us.archive.org from 2012-12-28T22:26:32 UTC to 2012-12-29T15:19:50 UTC.
Topic: crawldata
Wide Crawl started September 2012
web
eye 260,267
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl425.us.archive.org:wide from Wed Nov 14 17:39:11 PST 2012 to Wed Nov 14 10:55:11 PST 2012.
Topic: crawldata
yahoo.co.jp
web
eye 260,073
favorite 0
comment 0
Source: yahoo.co.jp